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An numerical iterative framework for global modal stability analysis of compressible flows 
using a parallel environment is presented. The framework uses a matrix-free implementation 
to allow computations of large scale problems. Various methods are tested with regard to 
convergence acceleration of the framework. The methods consist of a spectral Cayley trans¬ 
formation used to select desired Eigenvalues from a large spectrum, an improved linear solver 
and a parallel block-Jacobi preconditioning scheme. 
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1 Introduction 

In fluid dynamics, it is crucial for many technical applications to obtain stability information of the flow. 
While the stability of certain flows like the inviscid hyperbolic-tangent velocity profile is well studied 
|Blu70l IMic64j . the global stability analysis of general flows is still a daunting task. Traditionally a 
stability analysis by e.g. ARPACK |LSY98j is performed by obtaining a linearized matrix representation 
of the discretized operator and using it to perform the analysis. 

This kind of computation is usually limited by the available memory and computational resources. 
But certain physical phenomena require a highly resolved discretization and high order discretization 
schemes, for example the computation of acoustic modes generated by a compressible flow. This leads to 
an immense memory requirement during a global stability analysis, since it requires information about 
the describing operator in the whole computational domain. This encourages the use of a matrix free 
method to perform the stability analysis, which lessens the required memory significantly. 

This article presents a framework which allows a matrix free computation of a global linear tempo¬ 
ral stability analysis of compressible flows using a high order spatial discretization of the compressible 
Navier-Stokes equations. Compared to previous applications of this method, for example by |Mac09) . the 
numerical stability analysis can be performed in a parallel manner. 

We focus on the the temporal linear stability analysis of compressible flows. Being able to compute the 
stability analysis without the need of an operator in matrix formulation leads to a considerable reduction 
in required memory during a computation. In turn, certain challenges which arise because of the matrix 
free method have to be considered. 

The presented framework employs iterative Krylov methods in order to obtain the desired Eigeninfor- 
mation. Krylov methods are a natural choice for this kind of problem since they can be parallelized well 
and can operate without explicit knowledge of the operator. The base method of choice is the implicitly 
restarted Arnold! method (IRAM) |Sor02| . It is combined with a spectral Cayley transformation to facil¬ 
itate in the selection of the correct Eigenvalues and to provide a speedup of the computation |MSR.94) . 
The use of this transform introduces a new linear system which has to be solved. An improvement of a 
common iterative method (GMRES) is tested as well as a preconditioning scheme in order to improve 
convergence. 
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Validation of the method is performed on a compressible mixing layer setup using a tangent hyperbolic 
profile as investigated by Blumen |Blu70) . The growth rate of the most unstable mode of this profile is 
computed using the proposed framework and compared to literature. 

This article is structured as follows: Section 2 of this paper will discuss the physical model used in the 
stability analysis. Section 3 will explain in detail the concepts used in the stability analysis framework and 
the parallelization of it. To close, a model problem and results based on it are presented and evaluated 
in sections 4 and 5. 


2 Governing equations 


For the problems considered herein, the compressible Navier-Stokes 0 equations are used as the governing 
equations. They are given in a characteristic type formulation ISesOO) and are used in a matrix free fashion. 
Note that the use of this special formulation is not required for the framework, other formulations can be 
applied as well. The formulation introduces the primitive variables p as pressure, Ui as the velocity vector 
and s as entropy. Additionally, Tij describes the stress tensor, T the temperature, <I> the dissipation and 
p the density, with p, and py describing the viscosity and bulk viscosity of the fluid, respectively. 
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This set of equations is closed with Sutherland’s law and the ideal gas equation and uses the summation 
convention with i = 1,2,3. 

Going forward, the short hand operator N{q) will be used to refer to the Navier Stokes equations, with 
q containing all primitive flow variables q= \p,Ui,s\. 

Within the used scheme, the wave propagation terms are discretized with a sixth order finite difference 
scheme using compact finite difference schemes by |Lel92) on a staggered grid, which allows an accurate 
computation required to extract the stability information. The dissipation terms were discretized using a 
similar scheme of fifth order. 


3 Numerical stability analysis 

Ansatz and overview 

In order to perform a global stability analysis, we need to formulate an appropriate ansatz which allows 
the computation of the temporal stability properties of the flow. 

q = J{qo)q (2) 

Therein J{qo) describes the temporal evolution of the operator A/iin = J{qo), linearized around a base 
state qo- To this end we investigate the temporal stability. The goal of the stability analysis is to obtain 
the Eigenvalues A of the Operator. For this, an ansatz fitting to the problem has to be chosen. The form 
of the ansatz depends on the structure of the investigated flow, especially on its degree of homogeneity. 


2 









For a general three dimensional flow without any degree of spatial homogeneity, the general approach Q 
can be used for the stability analysis, as given by |M ac og. 

q{x,y,z,t) = (j){x,y,z) ex.p{-iact) (3) 

Herein, a denotes the wave number of the temporal disturbance, c represents the wave propagation speed, 
while c/){x,y,z) describes the Eigenvector corresponding to a. Combining the model equation (|^ with the 
general approach © results in an Eigenvalue problem Q which has to be solved. 

J{qo)4>{x,y) = X4>{x,y) (4) 

An Eigenvalue problem of this type can be solved directly if the Jacobian matrix J{qo) is known. Since 
the memory requirements for storing this matrix are too high for global stability analysis problems, we 
have to employ matrix free methods to solve the it. 

Matrix free framework 

In order to compute the Eigenvalues needed for the stability analysis, the Jacobian of the operator solving 
the defining equations Q is required. Since the defining equations are nonlinear and not available in a 
matrix formulation, an approximation of the Jacobian has to be computed. Therefore a Frechet-type 
derivative ([^ is employed for the linearization of the nonlinear operator during the stability analysis. 

(5) 

The correct choice of the parameter e required for this derivative is critical to its accuracy. We used the 
findings on the influence of e given by |SSS09) and set it accordingly to e = ||g'||yAm, with Cm representing 
the machine accuracy of the system. The use of this linearization requires the evaluation of the defining 
equations AA(g) every time the Jacobian is required during the computation. This application is usually 
one of the most numerically expensive operations and one should therefore try to minimize the number 
of required calls. 

The use of a matrix free method to obtain information about J without explicitly storing it leads 
naturally to the use of iterative Krylov subspace methods, which do not require the Jacobian in explicit 
form but rather just matrix vector products of the type J(go)</>(a;,y)- These matrix vector products can 
be computed in a matrix free fashion by using the previously described Frechet derivative ([^. 

Krylov methods 

Since the framework is designed to work in a matrix free environment, a method which can operate in 
such an environment is required for the stability analysis. Krylov methods suite this requirement well and 
will therefore be used as method of choice. The idea of Krylov methods is to iteratively create a sequence 
of subspaces /Cm which meets Q. 

iClCMinTo) C /C2(A/jinTo) C ... C /Cm(MinTO) (6) 

As stated before, Min = Jiqo) is a (linearized) matrix representation of the Jacobian of M{q). Note that 
due to the iterative nature of Krylov methods only the product given by the Frechet derivative (|^ is 
required. 

The basis Vm of these subspaces can then be used to compute the Arnold! relation (Q, which projects 
a portion of Min onto the Krylov subspace. This results in a Hessenberg matrix Hm that is usually much 
smaller than Min and contains an approximation of the eigenvalues and eigenvectors of the operator. 

MinlM = VmHm + /mCm (7) 

In Q, fmSm represents the last component of the residual vector of the size m Arnold! decomposition, 
which is visualized in Fig. 
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Figure 1: Visualization of the Arnoldi relation, see |Sor 02]. 

Implicitly restarted Arnoldi method 

In order to solve the eigenvalue problem Q iteratively, the implicitly restarted Arnoldi method (IRAM) 
developed by |Sor02j is used. It offers a robust method for obtaining Ritz pairs - approximations to 
the Eigenvalues and Eigenvectors of the operator A/". Its main benefits are twofold. On one hand, it 
is a Krylov-type method and therefore does not need the explicit Jacobian of the operator. On the 
other hand, the IRAM allows a certain degree of control concerning which part of the spectrum is to be 
computed. This feature is especially important in order to obtain the desired results in a manageable 
time frame. In essence, it is a restarted Arnoldi method which allows the removal of unwanted Eigenvalue 
information during the restart of the iteration by using a QR-shift. A schematic overview of the algorithm 
is presented in algorithm An important aspect of the IRAM with respect to the stability analysis is 
to define a suitable convergence criterion to determine when the desired Ritz pairs have been computed. 
As pointed out by |LS96) . choosing an appropriate criterion is difficult for cases involving non-Hermitian 
operators which occur often when analyzing compressible flows since the residual norm ||/m|| of the Arnoldi 
decomposition does not necessarily imply convergence as opposed to the Hermitian case. To ensure a small 
backward error, a Ritz estimate (|^ proposed by )LS96) is used. This criterion is applied to each computed 
Ritz pair {x, 6). 

Il/ffi|| l^m^l — Oaax(em| I I, to/iraml^l ) (8) 


Cayley transformation 

While the selective properties of the IRAM are very useful for this task, they are sometimes not sufficient 
when considering stability problems of compressible flows, which lead to problems due to clustering of 
Eigenvalues close to the origin or when dealing with badly conditioned flows. In order to enhance the 
selection properties of the framework even further, a Cayley transformation as proposed by |MSR94] , 
is applied to the Eigenvalue problem Q. 

C’(Min) = (Min - al)-^ (Min " ^^I) (9) 

This leads to a generalized Eigenvalue problem of the type 

(A/iin — (A/jin (^0) 

which has to be solved. The solution of this linear system requires an iterative solver. Due to the nature 
of the iterative solvers, the Cayley transformation is therefore only applied inexactly. 


4 

















Algorithm 1 Implicitly restarted Arnold! factorization |Sor02) . Parallel parts are represented by a 
“parallel” comment. 

Start with a m = k + p step Arnold! factorization (algorithm 

— ^mHm fm^'m ^ parallel 

while not converged do 

compute spectrum(ifm)) select p shifts I 2 i,p 2 , ■ ■ ■ 
perform implicit QR-Shifts: 

Q — Im 

for j = 1,... do 

call qr factorization(Pfm ~ k’jl) [Qj^ Rj\ 

Hm ^ Q*jHmQj 

Q ^— QQj 

end for 

update the Arnold! factorization: 

Pk = Hm{k + l,k) 
ak = Q{m,k) 
fk ~ '^k+lh fm^k 
Vk ^ VmQ{: ,l-.k) 

Hk ^ HmQiX '■ k,\ : k) 

convergence check using (|^ > parallel 

use obtained k-step factorization as starting point: 

= VkHk + fke*k 

apply p steps of Arnold! method to obtain new factorization 

MinKn = VmHm + 7^6^ > parallel 

end while 


To illustrate how the Cayley transformation affects the spectrum of the transformed operator, Fig. 
shows a comparison of a sample spectrum with and without a Cayley transformation. Depending on the 
parameters p and a, the transformation has two effects on the shape of the spectrum. It introduces a 
rotation as well as a stretching effect. Therefore, it can be used to separate clusters of Eigenvalues and 
to rotate desired Eigenvalues into the part of the spectrum with positive Eigenvalues, which are favored 
by Arnold! based methods. Another important feature of this transform is to map inner Eigenvalues to 
the outer part of the spectrum. This is important since Arnold! based method usually converge first onto 
Eigenvalues distant from the origin. 


Iterative solvers 


The basic solver for the solution of (10) is a typical restarted GMRES based on |SS86j . which was chosen 
for its robust numerical behaviour. Other methods are also applicable, for example BiCGstab. 

Since the solution of (10) usually incurs a large numerical effort, it is beneficial to reduce the iterations 
required to solve it as much as possible. We investigate the possible speedup of a modified GMRES version 
and compared them to the standard formulation. The analyzed LGMRES uses error approximations to 
augment the Krylov subspace between restart cycles |BJM05) . The variant aims at improving the speed 
of convergence of the traditional restarted GMRES by lessening the impact of the necessary restart. 


Parallelization 

In order to compute large scale stability analysis, parallelization of the framework is required. Since the 
framework is based on IRAM, it lends itself well to parallelization since the numerically most expensive 
steps, the construction of the new Krylov subspaces, can be distributed among available GPUs such 
that each process does not require information about the entire computational domain. This reduces 
the required communication between the processes significantly. The steps that require parallelization are 
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Figure 2: Comparison between an untransformed spectrum and its Cayley transformed counterpart, using 
= 440.5, a = 1200.23. Also showing transformation of two Cartesian lines. 


marked in the respective algorithm scheme such as Listing The implementation used in the scope of this 
work achieves parallelization by parallelizing the derivatives using MPI, but other means of parallelization 
are applicable as well. 

Preconditioning 

Preconditioning techniques are often necessary to improve the speed of convergence when solving large 
linear systems arising during the stability analysis. While many methods for preconditioning are well 
researched, their application to a matrix free and parallel case is not straightforward. Since a global 
preconditioner is difficult to assemble due to the lack of the operator matrix, a local approach for the 
preconditioner was chosen for this work to serve as a proof of concept for matrix free preconditioning in 
parallel stability analysis. 

The framework uses a local block-Jacobi scheme similar to |HS91) which assembles a local preconditioner 
matrix on each process and uses it to accelerate the solution of the linear system. While this approach is 
rather simple, it serves as a starting point for future refinement. 


Numerical framework overview 

The proposed framework can be roughly divided into two parts, an outer iteration loop consisting of 
the IRAM-algorithm |Sorn2| . which obtains the stability information, and an inner iteration loop for 
the solution of the linear system required for the application of the inexact Cayley transformation. The 
relationship between these is depicted in Fig. 

An application of the operator A/im is required for each step of the inner iteration. Each of these calls 
of the operator translates to one evaluation of the dehning equations Q. This is usually the step which 
requires the most computational resources. It is also the “innermost” step of the framework and therefore 
the most called. 
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Figure 3: Schematic overview of the stability framework. 


4 Model problem 


To illustrate the proposed framework, a model problem is used in the scope of this paper. We investigate 
the temporal stability of two-dimensional compressible mixing layer using a hyperbolic tangent velocity 
orofile similar to the works of Blumen |Blu70j. The orofile used is described bv eouation 0 , which 
imposes a velocity profile in the xi-component of the velocity onto the stream. The thickness of the shear 
layer is controlled by the parameter 6. 


Ui{x2) 


Uoo tanh 


— 0.5 • L2^ 


( 11 ) 


with Uoo as reference velocity and L 2 as domain length in X 2 direction. The model problem is set up in 
a computational domain with 128 x 512 points as depicted in Fig. It employs a periodic boundary 
condition on the left and right boundaries as well as a non-reflective boundary condition on the lower and 
upper boundaries. Additionally, grid stretching was used to improve the resolution in the critical parts 
of the domain where the mixing effect occurs. This leads to a minimum cell distance in X 2 - direction of 
~ 0.08J centered towards to highest velocity gradients present in the profile. 

The tangent hyperbolic velocity profile is seeded into the domain as a starting condition. In order to 
test the parallel aspect of the framework, the computation is distributed between at least 4 processors. 

This model problem is investigated with regard to its linear temporal stability. The goal is to identify 
the least stable mode by using the previously outlined framework. The domain length in xi-direction Li 
should fit at least one entire wavelength of the desired mode, and was chosen to be exactly one wavelength 
of the least stable mode scaled by mixing layer thickness <5. Assuming a wavenumber a of the least stable 
mode, the length of the domain in xi-direction would result as The domain length in y-direction 

was chosen as 8 times Li. 

Compared to the traditional matrix based approach on stability analysis, the presented matrix free 
framework requires significantly less memory. Consider the above model problem with a 128 x 512 points 
resolution. Assuming double precision, storing the operator alone would require (128-512)^-64Bit ~ 32GB 
of memory. When performing a parallel computation, this requirement is shared between all available 
nodes. The memory requirements of the matrix-free approach are tested in the following Sec. 


5 Validation 

The proposed framework is validated using the model problem described in the previous section. The 
least stable modes for a various settings are computed and their growth rates compared to literature. 


7 






















Figure 4: Draft of the used computational domain showing dimensions and the hyperbolic tangent profile. 


see |Blu7n| IMacn9j . All computations are performed in parallel and cover the different types of iterative 
solvers described in the previous section as well as the preconditioner introduced previously. 

Results - Validation 

The proposed framework is able to compute growth rates of the least stable mode in good accordance to 
the results given by Blumen |Blu70| and Mack |Mac09j . The detailed results are compared in Tbl. The 
different configurations are designed to cover a range of Mach numbers as well as different resolutions at 
a high Reynolds number. The different configuration parameters are listed in Tbl. In addition to the 
computed growth rates, an exemplary mode computed during the validation is presented in Fig. 

From this results we find the growth to be computed in good accordance with literature. 


Configuration 

M 

Re 

Resolution 

VOl 

0.1 

—)• oo 

64 X 256 

V02 

0.5 

—>■ oo 

64 X 256 

V03 

0.9 

—>■ oo 

64 X 256 

V04 

0.5 

1000 

64 X 256 

V05 

0.1 

—>• oo 

16 X 64 

V06 

0.5 

—)■ oo 

16 X 64 

V07 

0.9 

—>■ oo 

16 X 64 

VOS 

0.5 

1000 

16 X 64 


Table 1: Configuration parameters for different validation setups. 


6 Numerical Experiments 

As mentioned previously, the computational costs and required memory of the stability problem increases 
rapidly when considering problems which require high resolutions. While the memory requirements are 
mitigated to a great extent by the use of a matrix-free framework, the computational effort required is 
still significant. It is therefore important to reduce this computational effort, for example by improving 
the convergence behavior of the stability framework to avoid unnecessary iterations. 
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Conhguration 

computed growth rate ix 

uj by Blumen 

UJ by Mack 

VOl 

0.189 

0.187 

0.187521 

V02 

0.1411 

0.141 

0.1411 

V03 

0.0547 

0.055 

0.054723 

V04 

0.1273 

n/a 

0.1271 

V05 

0.189 

0.187 

0.187521 

V06 

0.139 

0.141 

0.141167 

V07 

0.0540 

0.055 

0.054723 

VOS 

0.1270 

n/a 

0.1271 


Table 2: Results of the validation for various setups. 


This section therefore investigates the effectiveness of the preconditioning method and iterative solvers 
with regard to speed up the iteration based on the results of numerical experiments. Since one of the 
main objectives of the presented framework is its matrix-free and parallel implementation, we will discuss 
the scalability and especially memory aspects based on the numerical experiments. 

Lastly, while the choice of Cayley parameters has an influence on the performance of the framework, 
the presented numerical experiments do not cover a study on these parameters, since the focus of this 
work is to provide an overview of the framework. For a more detailed description of the effects of the 
Cayley transform, the reader is referred to |MSR94| or |Mac09j . Still, a few general remarks on the choice 
of Cayley parameters for this case are summarized at the end of this section. 

The numerical experiments are based on the model problem discussed previously and differ in resolution, 
type of preconditioning and iterative solver used to cover a variety of possible combinations. 

All further experiments share the following basic parameters: A Mach number M = 0.5, Reynolds 
number Re =—)• oo, domain length in xi- direction Li = 15.83(5, tolerance of the IRAM to/iram=ie -5 and 
tolerance for the GMRES-type method to/gmres = le — 6. The subspace size m of the IRAM was varied 
from 100 — 250, depending on the resolution of the case, with lower resolutions using smaller subspace 
sizes. 

Preconditioning Scheme 

Aside from the growth rate as a measure of the correctness of the computation, the number of required 
matrix-vector product operations is recorded. Since these correlate directly with the amount of required 
operator evaluations - the by far most numerically expensive part in these simulations. They serve as a 
good measure to compare different methods regarding speedup. Fig. [^compares different computations, 
using the Block Jacobi preconditioner as well as running without preconditioning. A reduction in needed 
iterations for the preconditioned cases is clearly visible in the comparison. The reduction differs from case 
to case, but usually ranges 40-60%. 

Aside from the speedup aspect another equally important aspect of the preconditioning became apparent 
during the numerical experiments. Since the stability problem arising from compressible flows is usually 
conditioned very poorly, computations without preconditioning often fail to produce accurate results. 
The use of a preconditioner alleviated this problem by enabling the computation of viable results in these 
cases and making the computation more robust. The resolutions used for the results above was therefore 
chosen small to allow a comparison to the unpreconditioned case. 

Linear Solvers 

Aside from the speedup achieved by the preconditioning, the speedup of the LGMRES solver is also in¬ 
vestigated by numerical experiments. Tablej^has exemplary results for the effectiveness of the LGRMES- 
method compared to standard GMRES. 
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Figure 5: Least stable mode computed with resolution 128 x 512 using 64 CPUs. One period in x-direction 
is shown. Depicted are fields pressure p and the velocity component U 2 = v. Also, the figure 
shows the underlying grid used in the computation. 
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Figure 6: Comparison of the iteration process of one call of the linear solver for five different setups using 


LGMRES. 
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Scaling behaviour 



Figure 7: Scaling behavior of the framework. 


In comparison to the effectiveness of the preconditioner, the speed up achieved by this method is rather 
small. Given the very low additional numerical effort incurred by this method compared to standard 
GMRES, the use of LGMRES is still advisable. 


Memory and computation time considerations 

In order to highlight the effects of the different methods with respect to speedup, a comparison between 
selected setups is displayed in Tbl. The table compares the different configurations in terms of required 
iterations as well as required memory during the computation. Note that the computation using the higher 
resolution was not possible without the use of a preconditioner. The memory requirement displayed therein 
is estimated analytically. The table contains several important aspects of the framework. The first to be 


no precond. 
no precond. 
block Jacobi 
block Jacobi 


resolution 
256 X 64 
1024 X 128 
256 X 64 
1024 X 128 


GMRES memory* LGMRES memory* 


1419325 0,1GB* 

/ 

65164 4.1GB 

1459950 256,4GB 


1389364 0,1GB* 

/ 

63472 4.1GB 

1411357 256,4GB 


Table 3: 


Iteration times for different solvers, preconditioning methods and resolutions for the model prob¬ 
lem. (*) The memory requirement is shared across the nodes. 


noted is the memory requirement of the method is heavily influenced by the method of preconditioning. 
In its current iteration, the use of the Block Jacobi method requires a significant amount of memory 
when investigating higher resolutions compared to the unpreconditioned case. This is a trade-off for the 
drastic reduction in required iterations compared to the unpreconditioned computations. For the examples 
provided in the given table, the preconditioned computation required roughly 20 times less iterations, a 
significant speedup even when considering the memory requirements of this method. However, the memory 
requirement is shared over the parallelized nodes. 

As an additional metric, the scaling of the computational time and the memory requirement is collected 
in Fig. to give an example for the scaling behavior of the framework when using the Block-Jacobi 
preconditioner. The scaling observed in this case massively deteriorates from 64 processors onward. This 
can be explained by the implementation of the preconditioner which requires a huge amount of calls 
of M{q) (ergo the right hand side of the considered equation) to compose the preconditioner prior to its 
application. With increasing number of processors used, the numerical costs of the composition surpass the 
costs of the solution of the preconditioned linear system, since the calls of M{q) require MPI calls. While 
this process is parallelized, it still acts as a bottleneck when using a relatively large number of processors 
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Memory scaling per node 



Figure 8: Theoretical memory scaling of the framework with preconditioning. 

- for the parameters given, this number is at 64 processors. Furthermore, the preconditioner cannot be 
omitted despite deteriorating the scaling of the framework since the preconditioner is a requirement to 
enable the correct computation in the first place. 

Due to this behavior, the improvement of the preconditioner seems to be the most promising route to 
consider for improving the performance of the framework. As with the framework in general, the goal is 
to reduce the number of required right hand side calls. This could be achieved by a using non compact, 
lower order discretization of the operator M{q). This would significantly reduce the MPI communication 
during operator calls while also having the possible benefit of reducing the memory required during the 
composition of the preconditioner, [M ac09| . Fig. shows the estimated memory requirement per node 
when using preconditioning. The actual memory requirements during numerical experiments the scale 
nearly linearly with the number of used processors, except for a small overhead which results in a sub- 
linear scaling. This is likely dependant on the method of parallelization used in the framework, although 
this was not investigated in the scope of this work. 

Cayley parameters and further remarks 

The numerical experiments performed for this work generated further observations which are not covered 
in the above sections, yet can still be useful to get a better understanding of the problem at hand and to 
serve as guidelines for common problems. 

Firstly, the condition of the linear systems deteriorates with increasing resolution. This is to be expected 
since an increase in resolution leads to a higher density of computable modes in the mixing layer. This 
results in an increased clustering of Eigenvalues which can affect the condition of the linear system. 
Additionally, a higher resolution can also result in an increase of unstable modes introduced by the 
discretization scheme. There are several ways to lessen the impact of these phenomena. 

A first measure to be considered when dealing with the increased resolution is to increase the sizes 
of the subspaces available to the IRAM and GMRES. This step is usually required to account for the 
increased complexity of the computed spectrum and comes with an increase in memory and computational 
resources. For the resolutions investigated in this works, subspace sizes varying from 100 to 250 were found 
adequate. 

In addition to the increase in subspace size, an adjustment of the Cayley parameters can also lead to an 
improvement of the computation for higher resolutions. Unfortunately, the method for determining the 
correct parameters is not as straightforward compared to the subspace size since the Cayley parameters 
are seemingly not correlating with the resolution in a simple fashion. For the resolutions considered herein, 
the following parameter sets in Tbl. have been used successfully. While the Cayley parameters can 
influence the number of iterations required by the linear solver, the effect of fine tuning the parameters 
was not found significant. 


12 








resolution 
32 X 128 
32 X 256 
64 X 512 
128 X 1024 


recommended parameters 
^ = 230.5, a = 800.32 
// = 444.5, a = 1200.32 
/i = 444.5, a = 1200.32 
/i = 444.5, a = 1200.32 


Table 4: Recommended parameters for different resolutions of the mixing layer model problem. 


7 Conclusions 

A framework for the matrix free and parallel stability analysis of compressible flows has been presented 
and validation results and speedup experiments have been showcased. To enhance the performance of 
the framework, a straightforward parallel preconditioning method was presented and tested successfully. 
The methods presented herein can be seen as a proof of concept for the matrix free stability analysis in 
parallel environments. 

Furthermore, the numerical experiments provide several ways that can be explored for a further increase 
in efficiency of the computation. As mentioned previously and judging from the results of the numerical 
experiments, the preconditioning scheme appears to be the most promising route to explore for further 
improvements. Two aspects can be considered as an angle for improvements to the preconditioning scheme. 

Firstly, the construction of the preconditioner itself could be handled in a more sophisticated manner 
regarding memory requirements and method of construction. For example, rather than using the whole 
local operator as base for the preconditioning matrix, one could revert to using a reduced operator which 
still covers the physics but uses a much lower order finite difference scheme, as employed by |Macn9) in 
the non parallel case. This would reduce the memory required by the preconditioner while also facilitating 
its construction. 

Secondly, the inversion and application of the preconditioning matrix offers room for improvement, e.g. 
by using sparse techniques for saving the matrix if the structure of the matrix is suitable. 

While two different linear solvers were compared, the impact of the chosen method was found to be 
small compared to the impact of the preconditioning method. Therefore, while other methods could still 
be investigated with respect to speed up, the study of the preconditioning method seems to be more 
efficient, especially. 
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